计算机系统：程序员视角（全球版）：性能至上

现代优化是高层算法选择与底层机器感知之间的协作。尽管 渐近效率 定义了理论极限，但 性能至上 要求我们解决 常数因子 这些编译器无法单独处理的因子。

1. 优化层次结构

成功遵循线性过程：首先，消除 渐近低效 （例如，$O(N^2) \to O(N)$）。接着，解决 优化障碍——主要是 内存别名 以及函数调用开销（如恒定的 边界检查 在 get_vec_element中）。

2. 数据流与约束

编译器出于安全考虑而保守；如果指针 *dest 可能与向量 data重叠，它们就不会进行优化。我们通过 每元素周期数（CPE）来衡量实际运行速度。性能通常用缩放因子建模，例如 $\alpha = 0.974$，其中开销会使执行曲线偏移（例如，$209/\alpha = 39.0$）。

3. 硬件现实

优化需要理解 退休单元 和 关键路径。即使简单的循环也受限于功能单元的 吞吐量限制 或依赖链的 延迟限制 依赖链限制。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

Compare the performance of the three versions of bubblesort (4.47, 4.48, and 4.49). Why does one perform better?

4.49 is fastest because it reduces memory references and leverages conditional moves.

4.47 is fastest because array indexing is always faster than pointer arithmetic.

They all have the same CPE because the Big-O is $O(N^2)$.

4.48 is slowest due to stack spilling.

QUESTION 2

In the function void swap(long *xp, long *yp) using the XOR/Addition trick, what happens if xp == yp?

The values are successfully swapped.

The location is set to 0.

The program throws a segmentation fault.

The compiler optimizes it into a NOP.

✅ Correct!

This is a classic memory aliasing example. If the pointers are identical, the first operation *xp = *xp + *yp doubles the value, and the subsequent subtractions result in 0.

QUESTION 3

Why does the version of polynomial evaluation in Practice Problem 5.5 run faster despite having more operations?

It uses more functional units.

It reduces the depth of the dependency chain (Critical Path).

It eliminates all floating-point multiplications.

It uses loop unrolling by a factor of 10.

QUESTION 4

What does the Retirement Unit do in a modern out-of-order processor?

It fetches instructions from memory.

It maintains sequential semantics and controls register file updates.

It performs floating-point multiplication.

It predicts branch directions.

QUESTION 5

Which of these is a significant 'Optimization Blocker' for GCC?

Asymptotic complexity.

Memory Aliasing.

Variable naming conventions.

The use of long instead of int.